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AUTOMATIC  INTERACTION  DETECTOR-VERSION  4 (AIDH 
REFERENCE  MANUAL  ADDENDUM  1 


I.  INTRODUCTION 

AlD-4,  an  acronym  for  automatic  interaction  detector-version  4,  is  a computer-based  regression 
model  building  algorithm  which  was  originally  made  operational  on  an  IBM  7040  computer  system  at  the 
Computational  Sciences  Division,  Air  Force  Human  Resources  Laboratory  (AFHRL),  Lackland  Air  Force 
Base,  Texas.  Tlie  background,  basic  algorithm,  and  technical  details  required  for  use  of  AID-4  are  described 
by  Koplyay,  Gott,  and  Elton  ( 1973L 

Subsequent  acquisition  of  a UNIVAC  1108  computer  system  by  the  Computational  Sciences 
Division,  allowed  for  the  expansion  of  A1D4  capabilities.  This  report  serves  as  an  addendum  to  tlie  original 
reference  manual  mentioned  previously.  This  addendum  contains  three  sections  which  describe  changes  to 
the  IBM  7040  version  of  the  program,  technical  notes  on  certain  aspects  of  the  program  not  mentioned 
before,  and  information  to  assist  users  on  the  LINIVAC  1108  system.  Specifications  provided  should  also 
prove  useful  for  implementing  .A1D4  on  other  computer  systems. 


II.  aiANUE  OF  LIMITATIONS  OF  THE  AtD4  COMPUTER  ALliORITHM 
ADAPTED  TO  THE  UNIVAC  1108  (Reference  AFHRL-TR-73-1 7) 

1.  All  references  in  AFHRL-TR-73-1 7 to  the  upper  range  of  the  recode  categories  should  be 
changed  from  (0-39)  to  (049);  i.e.,  the  total  number  of  categories  per  variable  changed  from  40  to  50.  Tlie 
following  pages  are  affected: 

Page  16  Last  3 lines 

Page  19  4th  line  from  bottom  of  page 

Page  22  Middle  of  page  (3d  and4tli  line  from  para  111) 

Page  23  4th  and  5th  line  from  bottom  of  page 
Page  28  LiTnitation  No.  8 
Page  29  Error  No.  30 

Page  30  In  the  formulas  in  Adoption  Note  No.  8 change  In  40  to  In  50  in  the  denominators  and 
the  results;  i.e.,  replace  6.58  = 6,  5.82  = 5,  and  1 1 .086  = 1 1 with  6.20  = 6,  5.49  = 5,  and 
10.45  = 10,  respectively. 

2.  Change  all  references  in  AFHRL-TR-73-1 7 to  the  maximum  number  of  input  variables  from  80 
to  300.  The  following  pages  are  affected: 

Page  14  Under  Card  Columns  27-29  and  30-32,  change  83  to  303 

Page  28  Under  Limitations  change  83  to  303  (para  1 ) and  80  to  300  (para  2) 

Page  29  Error  No.  38:  Change  83  to  303 

3.  Tlie  references  in  AFHRL-TR-73-1 7 to  the  total  tutinbcr  ol  categories  of  a given  problem  require 
the  following  changes: 

Page  28  Limitations  No.  3,  change  700  to  2500 
Page  29  Errors  No.  26.  27  and  31.  change  700  to  2500 

4.  Tlierc  arc  numerous  changes  to  the  FORTRAN  listing  appearing  in  AFHRl -TR-73-1 7 which 
were  necessary  to  implement  AID4  on  the  UNIV.AC  1108.  Tlicsc  changes  do  not  affect  the  user  and  will 
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not  be  listed.  Revised  FORTRAN  source  program  listings  could  be  provided  to  qualified  users  upon  J 

justified  request.  • 


III.  AIDA  RANDOM  SELECTION  PROCESS  CORRECTION  AND  TECHNICAL  DESCRIPTION 

1.  The  program  listing  of  AIDA  given  in  AFHRL-TR-73-17  has  an  error  on  line  AID06480  (page  51 
of  AFHRL-TR-73-17).  Dne  AID064480  reads 

PROl  = FL0AT(NSAB-NSEL)/FL0AT(NC-NIN+1). 

The  above  statement  should  be  changed  to 

PROl  = FLOAT(NSAB-NSEL-NOUT)/FLOAT(NC-NIN+l). 

2.  The  random  selection  process  used  in  AlD-4  when  a user  specifies  IRUN  = 2 or  3 (on  the  title 
card)  employs  the  same  concepts  as  the  following  “balls  fron\an  urn”  analogy,  where  N,  N^,  and  Ng  in  the 
urn  problem  are  analogous  to 

N = number  of  cases  in  the  original  input  file, 

= number  of  cases  requested  by  the  user  to  be  randomly  selected  for  assignment  to  sample 
A,  and 

Ng  = number  of  cases  requested  by  the  user  to  be  randomly  selected  for  assignment  to  sample 
B 

Select  balls  one  at  a time  without  replacement  from  an  urn  which  initially  contains  N balls  of  which 
N^  are  labeled  A Ng  are  labeled  B and  the  remainder,  if  any,  are  unlabeled.  If  i balls  have  been  selected 
where  n^  were  labeled  A,  ng  were  labeled  B,  and  i>ny^+ng,  then  the  probability  that  the  next  ball  selected 
is  labeled  A or  Bis 

p(AorB)=  |N^+Ng-(n^+ng)  I / { N-i  { 

The  probability  that  the  next  ball  selected  is  labeled  A is 

p(A)=  |N^-nJ  /{N-i} 

AIDA  makes  its  assignments  to  sample  A and  sample  B by  generatihg  a uniform  random  deviate  r in 
the  interval  (0,1)  and  by  applying  it  to  the  following  logic  sequence: 

(a)  Assign  case  to  sample  A if  r<p(A)  and  case  does  not  contain  out-of-range  data;  therefore,  if  the 
case  is  assigned  to  sample  A,  n^=n^+l. 

(b)  Assign  case  to  sample  B if  p(A)<r<p(A  or  B)  and  case  does  not  contain  out-of-range  data; 
therefore,  if  the  case  is  assigned  to  sample  B,  ng  = ng+1. 

(c)  Case  is  not  assigned  to  a sample  if  r>p(A  or  B). 

NOTE;  If  the  case  is  not  assigned  to  a sample  because  it  contains  out-of-range  data,  the  following 
computation  is  performed  to  approximate  the  number  of  cases  in  sample  A and  sample  B. 


"a  = "a  + Na/(Na  + Ng)  and  ng  = Ug  + Ng/(N^  + Ng) 


IV.  SUPPLEMENTAL  INFORMATION  FOR  RUNNING  AID4  ON  THE  UNIVAC  1 108 


1 . Runstreams  for  the  UN  IV AC  1108. 

The  following  card  sequence  is  required  to  use  the  A1D4  program  as  it  is  operational  on  the 
AFHRL  UNIV AC  1 108  computer.  (Other  computer  systems  will  require  different  systems-runstreams.)  The 
files  involved  in  the  run  may  be  either  tape  of  mass  storage.  The  cataloging  options  (as  required  by  tape  or 
mass  storage)  will  be  supplied  by  the  user. 

Note:  Several  types  of  runs  may  be  performed  by  A1D4  depending  upon  the  value  of  “IRUN”  in 
column  50  of  the  title/parameter  card.  These  runs  may  be  briefly  described  as  follows: 

IRUN=0Use  every  case  in  the  original  input  file  for  a normal  AID4  run; no  forced  splitting. 

1RUN=1  Select  a random  sample  A from  the  original  input  file,  and  use  only  the  cases  that  belong 
to  A for  a normal  AID4  run. 

IRUN=2  Select  a random  sample  A and  a random  sample  B from  the  original  input  file,  and  use 
only  those  cases  that  belong  to  A for  a normal  AID4  run.  Then  force  those  cases  that 
belong  to  B to  make  the  same  splits  as  taken  by  A;  i.e.,  single  cross-validation. 

IRUN=3  Select  a random  sample  A and  a random  sample  B from  the  orignial  input  file,  and  use 
only  the  cases  that  belong  to  A for  a normal  AID-4  run.  Then  force  those  cases  that 
belong  to  B to  make  the  same  splits  as  taken  by  A Then  use  only  those  cases  that  belong 
to  B for  a normal  AID4  run,  and  force  those  cases  that  belong  to  A to  make  the  same 
split  as  taken  by  B;  i.e.,  double  cross-validation. 

IRUN=4  Given  sample  A and  sample  B (no  random  selection  by  the  program),  use  only  the  cases 
that  belong  to  A for  a normal  AID4  mn.  Then  force  those  cases  that  belong  to  B to  make 
the  same  splits  as  taken  by  A;  i.e.,  single  cross-validation.  Note  that  double 
cross-validation  can  be  accomplished  by  submitting  a second  job  with  samples  A and  B 
switched. 

1.1  IRUN=0,  l,2,or3. 

Order  Type 


1. 

@RUN  RUN-ID,  Job,Section 

2. 

@ASG,A  DATA 

) Alternate  data  file  if 

3. 

f“'USE  10,DATA 

) data  is  not  on  cards 

4. 

@ASG,T  TEMP  I .,- 

) Scratch  file 

5. 

@USE  11, TEMPI 

) 

6. 

@ASG,T  TEMP-2  , - 

) Scratch  file 

7. 

(»USE  12,TEMP-2 

) 

8. 

(»ASG,T  TEMP-3., - 

) Sample  A 

9. 

WSE  13,TEMP-3 

) or  total 

10. 

^ASG,T  TEMP4.,- 

) Scratch  file 

11. 

(»USE  14.TEMP4 

) 

12. 

@ASG,T  TEMP-5.,- 

) TREE  PLOT 

13. 

^>USE  15,TEMP-5 

) information 

14. 

@ASG,T  TEMP-6.,- 

) Sample  B (2,  3)  or 

15. 

ff^USE  18,TEMP-6 

) Optional  Residuals  (0.1) 

16. 

(o’XOT  T*T.A1D4 
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17. 

Title  (Parameters)  Card 

18. 

Data  Format  Card(s) 

19. 

Description  Card 

20. 

Predictor  Card(s);  at  least  one  per  predictor 

21. 

Criterion  Card 

22. 

Data  (if  on  cards,  otherwise  on  UlO) 

23. 

End-of-job  Card 

24. 

@F1N 

1RUN=4. 

Order  Type 

1. 

@RUN  RUN-ID^ ob,Section 

2. 

@ASG,A  SAMP-A 

) Sample  A 

3. 

(ffUSE  10, SAMP-A 

) Data  file 

G,1  TEMP-1.,- 

) Scratch  file 

F 11, TEMP-1 

) 

.o,T  TEMP-2., - 

) Scratch  file 

("USE  12,TEMP-2 

) 

8. 

@ASG,T  TEMP-3.,- 

) Scratch  file 

9. 

@USE  13,TEMP-3 

) 

10. 

@ASG,T  fEMP4.,- 

) Scratch  file 

11. 

fa'USE  I4.TEMP4 

) 

12. 

C«ASG,T  TEMP-5.,- 

) TREE  PLOT 

13. 

@USE  15,TEMP-5 

) information 

14. 

Ca>ASG,A  SAMP-B 

) Sample  B 

15. 

(SGSE  18,SAMP-B 

) Data  file 

16. 

fo^XQT  T*T.AID4 

17. 

Title  (Parameters)  Card 

18. 

Data  Format  Card(s) 

19. 

Description  Card 

20. 

Predictor  Card(s)  : at  least  one  per  predictor 

21. 

Criterion  Card 

22. 

End-of-job  Card 

23. 

(o'FlN 

2.  File  Flowcharts. 

Note;  All  unmentioned  files  are  “scratch”  (temporary)  files  used  in  either  “split”  or  “forced  split.” 
2. 1 1RUN=0  and  1RUN=1. 


8 


2.4  lRUN-4.  (Single  cross-validation  given  samples  A and  B) 
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